Nearly Optimal Semi-Supervised Learning on Subgraphs
نویسندگان
چکیده
The harmonic solution (HS) on a graph is one of the most popular approaches to semi-supervised learning. This is the first paper that studies how to identify highly confident HS predictions on a graph based on the HS on its subgraph. The premise of our method is that the subgraph is much smaller than the graph and therefore the most confident predictions can be identified much faster than computing the HS on the graph. We introduce a class of subgraphs that allow for good approximations, prove bounds on the difference in the HS on the graph and its subgraph, and propose an efficient approach to building the subgraphs. Our solution is evaluated in the domains of handwritten digit recognition, and topic discovery in restaurant and hotel reviews. In all cases, we show that only a small portion of the graph is sufficient to identify highly confident predictions.
منابع مشابه
Efficient semi-supervised learning on locally informative multiple graphs
We address an issue of semi-supervised learning on multiple graphs, over which informative subgraphs are distributed. One application under this setting can be found in molecular biology, where different types of gene networks are generated depending upon experiments. Here an important problem is to annotate unknown genes by using functionally known genes, which connect to unknown genes in gene...
متن کاملGraph Partition Neural Networks for Semi-Supervised Classification
We present graph partition neural networks (GPNN), an extension of graph neural networks (GNNs) able to handle extremely large graphs. GPNNs alternate between locally propagating information between nodes in small subgraphs and globally propagating information between the subgraphs. To efficiently partition graphs, we experiment with several partitioning algorithms and also propose a novel vari...
متن کاملSemisupervised learning using feature selection based on maximum density subgraphs
We present a new graph based semi-supervised learning algorithm, using multiway cut on a neighborhood graph to achieve an optimum classification. We also present a graph based feature selection algorithm utilizing the global structure of the graph derived from both labeled and unlabeled examples. With respect to the experiments we conducted, both of our approaches are proved to have a promising...
متن کاملData dependent kernels in nearly-linear time
We propose a method to efficiently construct data-dependent kernels which can make use of large quantities of (unlabeled) data. Our construction makes an approximation in the standard construction of semi-supervised kernels in Sindhwani et al. (2005). In typical cases these kernels can be computed in nearly-linear time (in the amount of data), improving on the cubic time of the standard constru...
متن کاملSemi-supervised learning of hierarchical representations of molecules using neural message passing
With the rapid increase of compound databases available in medicinal and material science, there is a growing need for learning representations of molecules in a semi-supervised manner. In this paper, we propose an unsupervised hierarchical feature extraction algorithm for molecules (or more generally, graph-structured objects with fixed number of types of nodes and edges), which is applicable ...
متن کامل